Building Domain Specific Bilingual Dictionaries
نویسندگان
چکیده
This paper proposes a method to build bilingual dictionaries for specific domains defined by a parallel corpora. The proposed method is based on an original method that is not domain specific. Both the original and the proposed methods are constructed with previously available natural language processing tools. Therefore, this paper contribution resides in the choice and parametrization of the chosen tools. To illustrate the proposed method benefits we conduct an experiment over technical manuals in English and Portuguese. The results of our proposed method were analyzed by human specialists and our results indicates significant increases in precision for unigrams and muli-grams. Numerically, the precision increase is as big as 15% according to our evaluation.
منابع مشابه
Automatic Methods for the Extension of a Bilingual Dictionary using Comparable Corpora
Bilingual dictionaries define word equivalents from one language to another, thus acting as an important bridge between languages. No bilingual dictionary is complete since languages are in a constant state of change. Additionally, dictionaries are unlikely to achieve complete coverage of all language terms. This paper investigates methods for extending dictionaries using non-aligned corpora, b...
متن کاملBuilding basic vocabulary across 40 languages
The paper explores the options for building bilingual dictionaries by automated methods. We define the notion ‘basic vocabulary’ and investigate how well the conceptual units that make up this language-independent vocabulary are covered by language-specific bindings in 40 languages.
متن کاملBuilding a Basque-Chinese Dictionary by Using English as Pivot
Bilingual dictionaries are key resources in several fields such as translation, language learning or various NLP tasks. However, only major languages have such resources. Automatically built dictionaries by using pivot languages could be a useful resource in these circumstances. Pivot-based bilingual dictionary building is based on merging two bilingual dictionaries which share a common languag...
متن کاملExtracting English-Korean Transliteration Equivalence from Domain-Specific Dictionaries
Automatic translation knowledge acquisition or automatic bilingual dictionary construction has become an important first step for natural language applications such as machine translation and cross-language information retrieval. Transliterations are used to translate proper names and technical terms especially from languages in Roman alphabets to languages in non-Roman alphabets such as from E...
متن کاملCombining Corpus and Machine - ReadableDictionary Data for Building Bilingual
This paper describes and discusses some theoretical and practical problems arising from developing a system to combine the structured but incomplete information from machine readable dictionaries (MRDs) with the unstructured but more complete information available in corpora for the creation of a bilingual lexical data base, presenting a methodology to integrate information from both sources in...
متن کامل